Pesquisa | Biblioteca Virtual em Saúde

1.

False discovery rates of qpAdm-based screens for genetic admixture.

Yüncü, Eren; Isildak, Ulas; Williams, Matthew P; Huber, Christian D; Flegontova, Olga; Vyazov, Leonid A; Changmai, Piya; Flegontov, Pavel.

bioRxiv ; 2023 Oct 18.

Artigo em Inglês | MEDLINE | ID: mdl-37904998

RESUMO

Although a broad range of methods exists for reconstructing population history from genome-wide single nucleotide polymorphism data, just a few methods gained popularity in archaeogenetics: principal component analysis (PCA); ADMIXTURE, an algorithm that models individuals as mixtures of multiple ancestral sources represented by actual or inferred populations; formal tests for admixture such as f3-statistics and D/f4-statistics; and qpAdm, a tool for fitting two-component and more complex admixture models to groups or individuals. Despite their popularity in archaeogenetics, which is explained by modest computational requirements and ability to analyze data of various types and qualities, protocols relying on qpAdm that screen numerous alternative models of varying complexity and find "fitting" models (often considering both estimated admixture proportions and p-values as a composite criterion of model fit) remain untested on complex simulated population histories in the form of admixture graphs of random topology. We analyzed genotype data extracted from such simulations and tested various types of high-throughput qpAdm protocols ("rotating" and "non-rotating", with or without temporal stratification of target groups and proxy ancestry sources, and with or without a "model competition" step). We caution that high-throughput qpAdm protocols may be inappropriate for exploratory analyses in poorly studied regions/periods since their false discovery rates varied between 12% and 68% depending on the details of the protocol and on the amount and quality of simulated data (i.e., >12% of fitting two-way admixture models imply gene flows that were not simulated). We demonstrate that for reducing false discovery rates of qpAdm protocols to nearly 0% it is advisable to use large SNP sets with low missing data rates, the rotating qpAdm protocol with a strictly enforced rule that target groups do not pre-date their proxy sources, and an unsupervised ADMIXTURE analysis as a way to verify feasible qpAdm models. Our study has a number of limitations: for instance, these recommendations depend on the assumption that the underlying genetic history is a complex admixture graph and not a stepping-stone model.

2.

Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes.

Flegontov, Pavel; Isildak, Ulas; Maier, Robert; Yüncü, Eren; Changmai, Piya; Reich, David.

PLoS Genet ; 19(9): e1010931, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37676865

RESUMO

f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.

Assuntos

População Africana , Demografia , Filogenia , Polimorfismo de Nucleotídeo Único , Animais , Humanos , População Negra/genética , Mapeamento Cromossômico , Genótipo , Homem de Neandertal/genética , Polimorfismo de Nucleotídeo Único/genética , População Africana/genética , Demografia/história , Variação Biológica da População/genética , Modelos Estatísticos , Viés

3.

Evolutionary paths to mammalian longevity through the lens of gene expression.

Isildak, Ulas; Dönertas, Handan Melike.

EMBO J ; 42(17): e114879, 2023 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-37519235

RESUMO

The natural variation in mammalian longevity and its underlying mechanisms remain an active area of aging research. In the latest issue of The EMBO Journal, Liu et al (2023) analyze gene expression levels in 103 mammalian species across three tissues, revealing tissue-specific associations between gene expression patterns and longevity. Remarkably, the study suggests that methionine restriction, a strategy shown to increase lifespan, may extend beyond artificial interventions and is similarly employed by natural selection.

Assuntos

Longevidade , Metionina , Animais , Longevidade/genética , Metionina/genética , Metionina/metabolismo , Mamíferos/genética , Expressão Gênica , Envelhecimento/genética , Envelhecimento/metabolismo

4.

Author Correction: Temporal changes in the gene expression heterogeneity during brain development and aging.

Isildak, Ulas; Somel, Mehmet; Thornton, Janet M; Dönertas, Handan Melike.

Sci Rep ; 13(1): 10157, 2023 Jun 22.

Artigo em Inglês | MEDLINE | ID: mdl-37349363

5.

On the limits of fitting complex models of population history to f-statistics.

Maier, Robert; Flegontov, Pavel; Flegontova, Olga; Isildak, Ulas; Changmai, Piya; Reich, David.

Elife ; 122023 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-37057893

RESUMO

Our understanding of population history in deep time has been assisted by fitting admixture graphs (AGs) to data: models that specify the ordering of population splits and mixtures, which along with the amount of genetic drift and the proportions of mixture, is the only information needed to predict the patterns of allele frequency correlation among populations. The space of possible AGs relating populations is vast, and thus most published studies have identified fitting AGs through a manual process driven by prior hypotheses, leaving the majority of alternative models unexplored. Here, we develop a method for systematically searching the space of all AGs that can incorporate non-genetic information in the form of topology constraints. We implement this findGraphs tool within a software package, ADMIXTOOLS 2, which is a reimplementation of the ADMIXTOOLS software with new features and large performance gains. We apply this methodology to identify alternative models to AGs that played key roles in eight publications and find that in nearly all cases many alternative models fit nominally or significantly better than the published one. Our results suggest that strong claims about population history from AGs should only be made when all well-fitting and temporally plausible models share common topological features. Our re-evaluation of published data also provides insight into the population histories of humans, dogs, and horses, identifying features that are stable across the models we explored, as well as scenarios of populations relationships that differ in important ways from models that have been highlighted in the literature.

Assuntos

Genética Populacional , Hominidae , Humanos , Cães , Animais , Cavalos , Frequência do Gene , Software , Deriva Genética , Modelos Genéticos

6.

Modeling of African population history using f -statistics can be highly biased and is not addressed by previously suggested SNP ascertainment schemes.

Flegontov, Pavel; Isildak, Ulas; Maier, Robert; Yüncü, Eren; Changmai, Piya; Reich, David.

bioRxiv ; 2023 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-36711923

RESUMO

f -statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. These statistics can provide strong evidence for either admixture or cladality, which can be robust to substantial rates of errors or missing data. f -statistics are guaranteed to be unbiased under "SNP ascertainment" (analyzing non-randomly chosen subsets of single nucleotide polymorphisms) only if it relies on a population that is an outgroup for all groups analyzed. However, ascertainment on a true outgroup that is not co-analyzed with other populations is often impractical and uncommon in the literature. In this study focused on practical rather than theoretical aspects of SNP ascertainment, we show that many non-outgroup ascertainment schemes lead to false rejection of true demographic histories, as well as to failure to reject incorrect models. But the bias introduced by common ascertainments such as the 1240K panel is mostly limited to situations when more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans) or non-human outgroups are co-modelled, for example, f 4 -statistics involving one non-African group, two African groups, and one archaic group. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, cannot fix all these problems since for some classes of f -statistics it is not a clean outgroup ascertainment, and in other cases it demonstrates relatively low power to reject incorrect demographic models since it provides a relatively small number of variants common in anatomically modern humans. And due to the paucity of high-coverage archaic genomes, archaic individuals used for ascertainment often act as sole representatives of the respective groups in an analysis, and we show that this approach is highly problematic. By carrying out large numbers of simulations of diverse demographic histories, we find that bias in inferences based on f -statistics introduced by non-outgroup ascertainment can be minimized if the derived allele frequency spectrum in the population used for ascertainment approaches the spectrum that existed at the root of all groups being co-analyzed. Ascertaining on sites with variants common in a diverse group of African individuals provides a good approximation to such a set of SNPs, addressing the great majority of biases and also retaining high statistical power for studying population history. Such a "pan-African" ascertainment, although not completely problem-free, allows unbiased exploration of demographic models for the widest set of archaic and modern human populations, as compared to the other ascertainment schemes we explored.

7.

Somatic copy number variant load in neurons of healthy controls and Alzheimer's disease patients.

Turan, Zeliha Gözde; Richter, Vincent; Bochmann, Jana; Parvizi, Poorya; Yapar, Etka; Isildak, Ulas; Waterholter, Sarah-Kristin; Leclere-Turbant, Sabrina; Son, Çagdas Devrim; Duyckaerts, Charles; Yet, Idil; Arendt, Thomas; Somel, Mehmet; Ueberham, Uwe.

Acta Neuropathol Commun ; 10(1): 175, 2022 11 30.

Artigo em Inglês | MEDLINE | ID: mdl-36451207

RESUMO

The possible role of somatic copy number variations (CNVs) in Alzheimer's disease (AD) aetiology has been controversial. Although cytogenetic studies suggested increased CNV loads in AD brains, a recent single-cell whole-genome sequencing (scWGS) experiment, studying frontal cortex brain samples, found no such evidence. Here we readdressed this issue using low-coverage scWGS on pyramidal neurons dissected via both laser capture microdissection (LCM) and fluorescence activated cell sorting (FACS) across five brain regions: entorhinal cortex, temporal cortex, hippocampal CA1, hippocampal CA3, and the cerebellum. Among reliably detected somatic CNVs identified in 1301 cells obtained from the brains of 13 AD patients and 7 healthy controls, deletions were more frequent compared to duplications. Interestingly, we observed slightly higher frequencies of CNV events in cells from AD compared to similar numbers of cells from controls (4.1% vs. 1.4%, or 0.9% vs. 0.7%, using different filtering approaches), although the differences were not statistically significant. On the technical aspects, we observed that LCM-isolated cells show higher within-cell read depth variation compared to cells isolated with FACS. To reduce within-cell read depth variation, we proposed a principal component analysis-based denoising approach that significantly improves signal-to-noise ratios. Lastly, we showed that LCM-isolated neurons in AD harbour slightly more read depth variability than neurons of controls, which might be related to the reported hyperploid profiles of some AD-affected neurons.

Assuntos

Doença de Alzheimer , Humanos , Doença de Alzheimer/genética , Variações do Número de Cópias de DNA , Neurônios , Córtex Entorrinal , Encéfalo

8.

Inter-tissue convergence of gene expression during ageing suggests age-related loss of tissue and cellular identity.

Izgi, Hamit; Han, Dingding; Isildak, Ulas; Huang, Shuyun; Kocabiyik, Ece; Khaitovich, Philipp; Somel, Mehmet; Dönertas, Handan Melike.

Elife ; 112022 01 31.

Artigo em Inglês | MEDLINE | ID: mdl-35098922

RESUMO

Developmental trajectories of gene expression may reverse in their direction during ageing, a phenomenon previously linked to cellular identity loss. Our analysis of cerebral cortex, lung, liver, and muscle transcriptomes of 16 mice, covering development and ageing intervals, revealed widespread but tissue-specific ageing-associated expression reversals. Cumulatively, these reversals create a unique phenomenon: mammalian tissue transcriptomes diverge from each other during postnatal development, but during ageing, they tend to converge towards similar expression levels, a process we term Divergence followed by Convergence (DiCo). We found that DiCo was most prevalent among tissue-specific genes and associated with loss of tissue identity, which is confirmed using data from independent mouse and human datasets. Further, using publicly available single-cell transcriptome data, we showed that DiCo could be driven both by alterations in tissue cell-type composition and also by cell-autonomous expression changes within particular cell types.

Assuntos

Envelhecimento , Transcriptoma , Envelhecimento/genética , Animais , Fígado , Mamíferos/genética , Camundongos

9.

Distinguishing between recent balancing selection and incomplete sweep using deep neural networks.

Isildak, Ulas; Stella, Alessandro; Fumagalli, Matteo.

Mol Ecol Resour ; 21(8): 2706-2718, 2021 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-33749134

RESUMO

Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.

Assuntos

Genômica , Redes Neurais de Computação , Haplótipos , Metagenômica , Fenótipo

10.

Temporal changes in the gene expression heterogeneity during brain development and aging.

Isildak, Ulas; Somel, Mehmet; Thornton, Janet M; Dönertas, Handan Melike.

Sci Rep ; 10(1): 4080, 2020 03 05.

Artigo em Inglês | MEDLINE | ID: mdl-32139741

RESUMO

Cells in largely non-mitotic tissues such as the brain are prone to stochastic (epi-)genetic alterations that may cause increased variability between cells and individuals over time. Although increased inter-individual heterogeneity in gene expression was previously reported, whether this process starts during development or if it is restricted to the aging period has not yet been studied. The regulatory dynamics and functional significance of putative aging-related heterogeneity are also unknown. Here we address these by a meta-analysis of 19 transcriptome datasets from three independent studies, covering diverse human brain regions. We observed a significant increase in inter-individual heterogeneity during aging (20 + years) compared to postnatal development (0 to 20 years). Increased heterogeneity during aging was consistent among different brain regions at the gene level and associated with lifespan regulation and neuronal functions. Overall, our results show that increased expression heterogeneity is a characteristic of aging human brain, and may influence aging-related changes in brain functions.

Assuntos

Envelhecimento/genética , Biomarcadores/análise , Encéfalo/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Transcriptoma , Adolescente , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Encéfalo/crescimento & desenvolvimento , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Adulto Jovem

11.

ImaGene: a convolutional neural network to quantify natural selection from genomic data.

Torada, Luis; Lorenzon, Lucrezia; Beddis, Alice; Isildak, Ulas; Pattini, Linda; Mathieson, Sara; Fumagalli, Matteo.

BMC Bioinformatics ; 20(Suppl 9): 337, 2019 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-31757205

RESUMO

BACKGROUND: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS: ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS: While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.

Assuntos

Bases de Dados Genéticas , Genômica/métodos , Redes Neurais de Computação , Seleção Genética , Software , Algoritmos , Alelos , Genética Populacional , Humanos , Fenótipo

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA